Learning a Robust Word Sense Disambiguation Model using Hypernyms in Definition Sentences
نویسندگان
چکیده
This paper proposes a method to improve the robustness of a word sense disambiguation (WSD) system for Japanese. Two WSD classifiers are trained from a word sense-tagged corpus: one is a classifier obtained by supervised learning, the other is a classifier using hypernyms extracted from definition sentences in a dictionary. The former will be suitable for the disambiguation of high frequency words, while the latter is appropriate for low frequency words. A robust WSD system will be constructed by combining these two classifiers. In our experiments, the F-measure and applicability of our proposed method were 3.4% and 10% greater, respectively, compared with a single classifier obtained by supervised learning.
منابع مشابه
Word Sense Disambiguation Using WordNet Relations
In this paper, the “Weighted Overlapping” Disambiguation method is presented and evaluated. This method extends the Lesk’s approach to disambiguate a specific word appearing in a context (usually a sentence). Sense’s definitions of the specific word, “Synset” definitions, the “Hypernymy” relation, and definitions of the context features (words in the same sentence) are retrieved from the WordNe...
متن کاملWord Sense Disambiguation Using Vectors of Co-occurrence Information
This paper reports on the word sense disambiguation of Korean noun by using co-occurrence information in context. For a given noun, its local contextual word distribution is not enough to express their semantic characteristics for noun sense disambiguation. This paper proposes a cluster-based sense as a base vector. Contextual noise is removed by a term weighting method, and hypernyms of remain...
متن کاملAutomatic classification of bengali sentences based on sense definitions present in bengali wordnet
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of ...
متن کاملAutomatic Idiom Identification in Wiktionary
Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...
متن کاملAnalogical Word Sense Disambiguation
Word sense disambiguation is an important problem in learning by reading. This paper introduces analogical word-sense disambiguation, which uses human-like analogical processing over structured, relational representations to perform word sense disambiguation. Cases are automatically constructed using representations produced via natural language analysis of sentences, and include both conceptua...
متن کامل